binom (Binomial distribution)#
The binomial distribution models the number of successes in a fixed number of independent yes/no trials.
This notebook uses the same parameterization as scipy.stats.binom:
n= number of trials (a non-negative integer)p= success probability per trial (a number in ([0, 1]))
Learning goals#
By the end you should be able to:
recognize when a binomial model is appropriate (and when it isn’t)
write down the PMF/CDF and key properties
derive the mean, variance, and likelihood
implement binomial sampling using NumPy only
visualize PMF/CDF and validate with Monte Carlo simulation
use
scipy.stats.binomfor computation and estimation workflows
Table of contents#
Title & Classification
Intuition & Motivation
Formal Definition
Moments & Properties
Parameter Interpretation
Derivations
Sampling & Simulation
Visualization
SciPy Integration
Statistical Use Cases
Pitfalls
Summary
import math
import numpy as np
import plotly.express as px
import plotly.graph_objects as go
import os
import plotly.io as pio
pio.templates.default = "plotly_white"
pio.renderers.default = os.environ.get("PLOTLY_RENDERER", "notebook")
np.set_printoptions(precision=6, suppress=True)
rng = np.random.default_rng(42)
1) Title & Classification#
Name: binom (Binomial distribution)
Type: Discrete
Support: [ k \in {0, 1, 2, \dots, n} ]
Parameter space: [ n \in {0, 1, 2, \dots},\qquad p \in [0, 1] ]
Interpretation:
nis the number of independent trialspis the probability of “success” on each trial
2) Intuition & Motivation#
If you repeat the same yes/no experiment n times, and each trial succeeds with probability p, then
[ X \sim \mathrm{Binomial}(n, p) ]
means:
Xcounts how many successes you got out ofn.
What this distribution models#
counting successes in a fixed number of independent trials
each trial has the same success probability
Typical real-world use cases#
A/B tests: number of conversions out of
nvisitorsquality control: number of defective items in a batch (with independence as an approximation)
reliability: number of working components out of
nepidemiology: number of positive tests out of
n(when tests are conditionally independent)
Relations to other distributions#
Bernoulli: if
n = 1, thenBinomial(1, p)isBernoulli(p).Poisson approximation: if
nis large andpis small with (\lambda = np) fixed, then (\mathrm{Binomial}(n,p) \approx \mathrm{Poisson}(\lambda)).Normal approximation (CLT): for large
nwith (np(1-p)) not tiny, (X \approx \mathcal{N}(np,, np(1-p))) (often with a continuity correction).Beta–Binomial: if
pitself is random with a Beta prior, the marginal count distribution becomes beta–binomial.Negative binomial: counts trials needed to reach a fixed number of successes (“fixed successes” vs “fixed trials”).
3) Formal Definition#
Let (X) be the number of successes in (n) independent Bernoulli trials with success probability (p).
PMF#
For (k \in {0,1,\dots,n}):
[ \mathbb{P}(X = k) = \binom{n}{k} p^k (1-p)^{n-k} ]
and (\mathbb{P}(X=k)=0) for integers (k\notin{0,\dots,n}).
CDF#
For a real number (x), the CDF is
[ F(x) = \mathbb{P}(X \le x) = \sum_{k=0}^{\lfloor x \rfloor} \binom{n}{k} p^k (1-p)^{n-k} ]
with the conventions (F(x)=0) for (x<0) and (F(x)=1) for (x\ge n).
A useful special-function identity (often used for numerical evaluation) is:
[ \mathbb{P}(X \le k) = I_{1-p}(n-k,, k+1) ]
where (I) is the regularized incomplete beta function.
4) Moments & Properties#
Mean and variance#
[ \mathbb{E}[X] = np,\qquad \mathrm{Var}(X) = np(1-p) ]
Skewness and kurtosis#
Let (\sigma^2 = np(1-p)). Then
[ \text{skew}(X) = \frac{1-2p}{\sqrt{np(1-p)}} ]
The excess kurtosis is
[ \text{excess kurt}(X) = \frac{1 - 6p(1-p)}{np(1-p)} ]
(so the full kurtosis is (3 + \text{excess kurt}(X))).
MGF and characteristic function#
With (q = 1-p),
[ M_X(t) = \mathbb{E}[e^{tX}] = (q + p e^t)^n ]
[ \varphi_X(t) = \mathbb{E}[e^{itX}] = (q + p e^{it})^n ]
Entropy#
The (Shannon) entropy is
[ H(X) = -\sum_{k=0}^{n} \mathbb{P}(X=k),\log \mathbb{P}(X=k) ]
There is no simple closed form in general, but it is easy to compute numerically for moderate n.
Other useful properties#
Mode: (\lfloor (n+1)p \rfloor), with a tie when ((n+1)p) is an integer.
Additivity (same
p): if (X\sim\text{Bin}(n_1,p)) and (Y\sim\text{Bin}(n_2,p)) independent, then (X+Y\sim\text{Bin}(n_1+n_2,p)).Complement symmetry: if (X\sim\text{Bin}(n,p)), then (n-X\sim\text{Bin}(n,1-p)).
def _validate_n_p(n, p):
if isinstance(n, bool) or not isinstance(n, (int, np.integer)):
raise TypeError("n must be an integer")
n_int = int(n)
if n_int < 0:
raise ValueError("n must be >= 0")
p_float = float(p)
if not (0.0 <= p_float <= 1.0):
raise ValueError("p must be in [0, 1]")
return n_int, p_float
def binom_logpmf(k, n, p):
# Log PMF for Bin(n,p) for integer k. Returns -inf outside support.
n, p = _validate_n_p(n, p)
k_arr = np.asarray(k)
out = np.full(k_arr.shape, -np.inf, dtype=float)
k_int = k_arr.astype(int)
valid = (k_int == k_arr) & (k_int >= 0) & (k_int <= n)
if not np.any(valid):
return out
if p == 0.0:
out[valid & (k_int == 0)] = 0.0
return out
if p == 1.0:
out[valid & (k_int == n)] = 0.0
return out
kv = k_int[valid]
log_binom_coeff = (
math.lgamma(n + 1)
- np.vectorize(math.lgamma)(kv + 1)
- np.vectorize(math.lgamma)(n - kv + 1)
)
out[valid] = log_binom_coeff + kv * math.log(p) + (n - kv) * math.log1p(-p)
return out
def binom_pmf(k, n, p):
return np.exp(binom_logpmf(k, n, p))
def binom_pmf_support(n):
n, _ = _validate_n_p(n, 0.5)
return np.arange(n + 1)
def binom_pmf_array(n, p):
ks = binom_pmf_support(n)
return ks, binom_pmf(ks, n, p)
def binom_cdf(x, n, p):
n, p = _validate_n_p(n, p)
x_arr = np.asarray(x)
_, pmf = binom_pmf_array(n, p)
cdf = np.cumsum(pmf)
cdf = np.clip(cdf, 0.0, 1.0)
k = np.floor(x_arr).astype(int)
out = np.zeros_like(x_arr, dtype=float)
out[x_arr >= n] = 1.0
inside = (x_arr >= 0) & (x_arr < n)
if np.any(inside):
out[inside] = cdf[k[inside]]
return out
def binom_moments(n, p):
n, p = _validate_n_p(n, p)
mean = n * p
var = n * p * (1.0 - p)
if var == 0.0:
skew = float("nan")
excess_kurt = float("nan")
else:
skew = (1.0 - 2.0 * p) / math.sqrt(var)
excess_kurt = (1.0 - 6.0 * p * (1.0 - p)) / var
return {
"mean": mean,
"var": var,
"skew": skew,
"excess_kurt": excess_kurt,
}
def binom_entropy(n, p, *, base=math.e):
n, p = _validate_n_p(n, p)
_, pmf = binom_pmf_array(n, p)
mask = pmf > 0
H_nats = -np.sum(pmf[mask] * np.log(pmf[mask]))
if base == math.e:
return float(H_nats)
return float(H_nats / math.log(base))
n, p = 40, 0.25
moments = binom_moments(n, p)
moments
{'mean': 10.0,
'var': 7.5,
'skew': 0.18257418583505536,
'excess_kurt': -0.016666666666666666}
# Monte Carlo check (matches formulas up to sampling error)
samples = rng.binomial(n=n, p=p, size=200_000)
est_mean = samples.mean()
est_var = samples.var(ddof=0)
{
"formula_mean": moments["mean"],
"mc_mean": float(est_mean),
"formula_var": moments["var"],
"mc_var": float(est_var),
"entropy_nats": binom_entropy(n, p),
}
{'formula_mean': 10.0,
'mc_mean': 9.999065,
'formula_var': 7.5,
'mc_var': 7.510134125775002,
'entropy_nats': 2.423340794934518}
5) Parameter Interpretation#
n(number of trials) controls the scale and granularity. Largerntends to make the distribution wider and (often) more bell-shaped.p(success probability) controls the location and skew:if
p < 0.5, most mass is near 0 with right-skewif
p = 0.5, the distribution is symmetric aroundn/2if
p > 0.5, most mass is nearnwith left-skew
Two helpful identities: [ \mathbb{E}[X]=np,\qquad \mathrm{Var}(X)=np(1-p) ]
so increasing p shifts the distribution right, and increasing n typically increases both the mean and the variance.
from plotly.subplots import make_subplots
n_fixed = 20
p_values = [0.1, 0.3, 0.5, 0.7, 0.9]
p_fixed = 0.3
n_values = [5, 20, 100]
fig = make_subplots(
rows=1,
cols=2,
subplot_titles=(
f"PMF vs p (n={n_fixed})",
f"PMF vs n (p={p_fixed})",
),
)
ks = binom_pmf_support(n_fixed)
for p_ in p_values:
fig.add_trace(
go.Scatter(
x=ks,
y=binom_pmf(ks, n_fixed, p_),
mode="markers+lines",
name=f"p={p_}",
),
row=1,
col=1,
)
for n_ in n_values:
ks = binom_pmf_support(n_)
fig.add_trace(
go.Scatter(
x=ks,
y=binom_pmf(ks, n_, p_fixed),
mode="markers+lines",
name=f"n={n_}",
),
row=1,
col=2,
)
fig.update_xaxes(title_text="k", row=1, col=1)
fig.update_yaxes(title_text="P(X=k)", row=1, col=1)
fig.update_xaxes(title_text="k", row=1, col=2)
fig.update_yaxes(title_text="P(X=k)", row=1, col=2)
fig.update_layout(height=420, legend_title_text="")
fig.show()
6) Derivations#
A standard way to derive binomial moments is to write (X) as a sum of indicator variables.
Expectation#
Let (I_i) indicate success on trial (i): (I_i\in{0,1}) with (\mathbb{P}(I_i=1)=p).
Then [ X = \sum_{i=1}^{n} I_i ]
and by linearity of expectation: [ \mathbb{E}[X] = \sum_{i=1}^n \mathbb{E}[I_i] = \sum_{i=1}^n p = np ]
Variance#
If trials are independent, (\mathrm{Cov}(I_i,I_j)=0) for (i\ne j). So:
[ \mathrm{Var}(X) = \sum_{i=1}^{n} \mathrm{Var}(I_i) = \sum_{i=1}^n p(1-p) = np(1-p) ]
Likelihood (single observation)#
If you observe (X=k) with known n, the likelihood of p is
[ L(p\mid k) = \binom{n}{k} p^k (1-p)^{n-k} ]
Taking logs (and dropping constants that do not depend on p):
[ \ell(p) = k\log p + (n-k)\log(1-p) ]
Differentiate and set to zero:
[ \ell’(p) = \frac{k}{p} - \frac{n-k}{1-p} = 0 \quad\Longrightarrow\quad \hat p = \frac{k}{n} ]
So the MLE is the observed success fraction.
# Visualize the likelihood for p (single observation)
n = 30
k_obs = 9
p_grid = np.linspace(1e-6, 1 - 1e-6, 600)
log_binom_coeff = math.lgamma(n + 1) - math.lgamma(k_obs + 1) - math.lgamma(n - k_obs + 1)
logL = log_binom_coeff + k_obs * np.log(p_grid) + (n - k_obs) * np.log1p(-p_grid)
p_hat = k_obs / n
fig = go.Figure()
fig.add_trace(go.Scatter(x=p_grid, y=logL, mode="lines", name="log-likelihood"))
fig.add_vline(x=p_hat, line_dash="dash", line_color="black", annotation_text=f"MLE p̂={p_hat:.3f}")
fig.update_layout(title="Binomial log-likelihood for p (n fixed)", xaxis_title="p", yaxis_title="log L(p)")
fig.show()
7) Sampling & Simulation#
Below are two NumPy-only sampling strategies.
A) Sum of Bernoulli trials#
Directly simulate n Bernoulli trials and count successes.
Correct by construction
Cost: (O(n\cdot \text{size}))
B) Inverse CDF (inverse transform sampling)#
Compute probabilities (\mathbb{P}(X=k)) for (k=0,\dots,n)
Compute the cumulative sum to get the CDF on the support
Draw (U\sim\text{Uniform}(0,1)) and return the smallest (k) with (F(k)\ge U)
Cost: (O(n + \text{size}\log n)) with binary search (
np.searchsorted)Useful for didactic purposes; production implementations use more specialized algorithms
def sample_binom_bernoulli_sum(n, p, size=1, *, rng: np.random.Generator):
n, p = _validate_n_p(n, p)
u = rng.random(size=np.atleast_1d(size).tolist() + [n])
return (u < p).sum(axis=-1)
def sample_binom_inverse_cdf(n, p, size=1, *, rng: np.random.Generator):
n, p = _validate_n_p(n, p)
if p == 0.0:
return np.zeros(size, dtype=int)
if p == 1.0:
return np.full(size, n, dtype=int)
ks = np.arange(n + 1)
# Compute PMF stably via log-space, then normalize.
logp = binom_logpmf(ks, n, p)
logp = logp - np.max(logp)
pmf = np.exp(logp)
pmf = pmf / pmf.sum()
cdf = np.cumsum(pmf)
cdf[-1] = 1.0
u = rng.random(size=size)
return np.searchsorted(cdf, u, side="left")
n, p = 25, 0.35
size = 50_000
x1 = sample_binom_bernoulli_sum(n, p, size=size, rng=rng)
x2 = sample_binom_inverse_cdf(n, p, size=size, rng=rng)
{
"bernoulli_sum_mean": float(x1.mean()),
"inverse_cdf_mean": float(x2.mean()),
"theory_mean": n * p,
}
{'bernoulli_sum_mean': 8.74378,
'inverse_cdf_mean': 8.74036,
'theory_mean': 8.75}
8) Visualization#
We’ll visualize:
the PMF on ({0,\dots,n})
the CDF (as a step function)
Monte Carlo samples vs the exact PMF
n, p = 30, 0.4
ks, pmf = binom_pmf_array(n, p)
cdf = np.cumsum(pmf)
fig_pmf = go.Figure()
fig_pmf.add_trace(go.Bar(x=ks, y=pmf, name="PMF"))
fig_pmf.update_layout(title=f"Binomial PMF (n={n}, p={p})", xaxis_title="k", yaxis_title="P(X=k)")
fig_pmf.show()
fig_cdf = go.Figure()
fig_cdf.add_trace(go.Scatter(x=ks, y=cdf, mode="lines", line_shape="hv", name="CDF"))
fig_cdf.update_layout(title=f"Binomial CDF (n={n}, p={p})", xaxis_title="k", yaxis_title="P(X≤k)")
fig_cdf.show()
mc = sample_binom_inverse_cdf(n, p, size=200_000, rng=rng)
counts = np.bincount(mc, minlength=n + 1)
pmf_hat = counts / counts.sum()
fig_mc = go.Figure()
fig_mc.add_trace(go.Bar(x=ks, y=pmf_hat, name="Monte Carlo", opacity=0.6))
fig_mc.add_trace(go.Scatter(x=ks, y=pmf, mode="markers+lines", name="Exact PMF"))
fig_mc.update_layout(
title=f"Monte Carlo vs exact PMF (n={n}, p={p})",
xaxis_title="k",
yaxis_title="Probability",
)
fig_mc.show()
9) SciPy Integration#
SciPy provides a fast, numerically robust implementation via scipy.stats.binom.
Use
pmf,cdf,sf(survival function),rvs,logpmf,logcdf, …For parameter estimation,
binomdoes not currently expose a.fit()method (SciPy 1.15). Fornknown, the MLE forpis closed-form; otherwise you can do custom optimization.
from scipy.optimize import minimize_scalar
from scipy.stats import binom
n, p = 12, 0.2
ks = np.arange(n + 1)
pmf_scipy = binom.pmf(ks, n, p)
cdf_scipy = binom.cdf(ks, n, p)
samples_scipy = binom.rvs(n, p, size=10_000, random_state=rng)
{
"pmf_sum": float(pmf_scipy.sum()),
"cdf_last": float(cdf_scipy[-1]),
"sample_mean": float(samples_scipy.mean()),
"theory_mean": n * p,
}
{'pmf_sum': 1.0,
'cdf_last': 1.0,
'sample_mean': 2.4082,
'theory_mean': 2.4000000000000004}
# "Fit" p with n known (closed-form MLE + SciPy optimization check)
n_true, p_true = 20, 0.35
data = binom.rvs(n_true, p_true, size=2_000, random_state=rng)
p_mle_closed = data.mean() / n_true
def nll(p):
if not (0.0 < p < 1.0):
return float("inf")
return -binom.logpmf(data, n_true, p).sum()
res = minimize_scalar(nll, bounds=(1e-9, 1 - 1e-9), method="bounded")
{
"p_true": p_true,
"p_mle_closed": float(p_mle_closed),
"p_mle_opt": float(res.x),
"opt_success": bool(res.success),
}
{'p_true': 0.35,
'p_mle_closed': 0.348475,
'p_mle_opt': 0.3484751594559644,
'opt_success': True}
10) Statistical Use Cases#
A) Hypothesis testing (exact binomial test)#
Test whether a success probability equals a reference value (p_0).
B) Bayesian modeling (Beta–Binomial conjugacy)#
If (p\sim\text{Beta}(\alpha,\beta)) and (X\mid p \sim\text{Bin}(n,p)), then the posterior is
[ p\mid X=k \sim \text{Beta}(\alpha + k,, \beta + n - k) ]
This is one of the cleanest examples of conjugacy.
C) Generative modeling (counts conditioned on probabilities)#
Binomial observations show up whenever you generate counts given probabilities, e.g. modeling conversions, click-throughs, and success/failure outcomes at scale.
from scipy.stats import beta, betabinom, binomtest
# A) Hypothesis test
n = 50
k = 18
p0 = 0.25
test = binomtest(k=k, n=n, p=p0, alternative="two-sided")
ci = test.proportion_ci(confidence_level=0.95)
{
"k": k,
"n": n,
"p0": p0,
"p_value": float(test.pvalue),
"95%_CI": (float(ci.low), float(ci.high)),
}
{'k': 18,
'n': 50,
'p0': 0.25,
'p_value': 0.10037920682387039,
'95%_CI': (0.22915706682058734, 0.5080686477145561)}
# B) Bayesian update with Beta prior
alpha0, beta0 = 2.0, 2.0
n, k = 50, 18
alpha_post = alpha0 + k
beta_post = beta0 + (n - k)
posterior_mean = alpha_post / (alpha_post + beta_post)
posterior_ci = beta.ppf([0.025, 0.975], alpha_post, beta_post)
x = np.linspace(0, 1, 400)
fig = go.Figure()
fig.add_trace(go.Scatter(x=x, y=beta.pdf(x, alpha0, beta0), mode="lines", name="prior Beta(2,2)"))
fig.add_trace(
go.Scatter(
x=x,
y=beta.pdf(x, alpha_post, beta_post),
mode="lines",
name=f"posterior Beta({alpha_post:.0f},{beta_post:.0f})",
)
)
fig.add_vline(
x=posterior_mean,
line_dash="dash",
line_color="black",
annotation_text=f"posterior mean={posterior_mean:.3f}",
)
fig.update_layout(title="Beta prior → Beta posterior for p", xaxis_title="p", yaxis_title="density")
fig.show()
{
"posterior_mean": float(posterior_mean),
"posterior_95%_CI": (float(posterior_ci[0]), float(posterior_ci[1])),
}
{'posterior_mean': 0.37037037037037035,
'posterior_95%_CI': (0.24787219846491212, 0.5019667760094333)}
# Posterior predictive: future successes out of m trials
m = 20
k_new = np.arange(m + 1)
pred_pmf = betabinom.pmf(k_new, m, alpha_post, beta_post)
fig = go.Figure()
fig.add_trace(go.Bar(x=k_new, y=pred_pmf, name="posterior predictive"))
fig.update_layout(
title=f"Posterior predictive for X_new | data (m={m})",
xaxis_title="k_new",
yaxis_title="P(X_new=k_new)",
)
fig.show()
# C) Simple generative example: daily conversions
days = 60
visitors = rng.integers(200, 1200, size=days)
# latent daily conversion rate (captures extra variability beyond pure binomial)
p_day = rng.beta(30, 70, size=days)
conversions = np.array([rng.binomial(n=v, p=p) for v, p in zip(visitors, p_day)])
df = {
"day": np.arange(days),
"visitors": visitors,
"conversions": conversions,
"rate": conversions / visitors,
}
fig = go.Figure()
fig.add_trace(go.Scatter(x=df["day"], y=df["rate"], mode="markers+lines", name="observed rate"))
fig.update_layout(title="Simulated conversion rates", xaxis_title="day", yaxis_title="conversions / visitors")
fig.show()
{
"avg_visitors": float(np.mean(visitors)),
"avg_rate": float(np.mean(df["rate"])),
}
{'avg_visitors': 623.9833333333333, 'avg_rate': 0.3130981451528444}
11) Pitfalls#
Invalid parameters#
nmust be a non-negative integer.pmust lie in ([0,1]).
Numerical issues#
For large
n, the PMF can underflow in floating point. Prefer log-space computations:scipy.stats.binom.logpmf,logcdf, etc.For tail probabilities, prefer
sf(survival function) over1 - cdfto avoid catastrophic cancellation.
Modeling issues#
Independence can be violated (e.g. clustered trials, repeated users). A beta–binomial model can capture over-dispersion.
Constant p across trials may be unrealistic (mixture models / hierarchical models often fit better).
For sampling without replacement, a hypergeometric distribution is more appropriate.
Approximation gotchas#
Poisson approximation requires
nlarge,psmall, and (np) moderate.Normal approximation is best when (np(1-p)) is reasonably large (rule-of-thumb: (\ge 10)).
12) Summary#
binomis a discrete distribution on ({0,\dots,n}) counting successes innBernoulli trials.PMF: (\binom{n}{k}p^k(1-p)^{n-k}), mean (np), variance (np(1-p)).
Derivations are clean via the indicator-sum representation (X=\sum I_i).
For computation and tail probabilities, prefer SciPy’s
binom(especially in log-space).When
pvaries across trials or trials aren’t independent, consider richer models (e.g. beta–binomial).
References
SciPy:
scipy.stats.binomandscipy.stats.binomtestAny standard probability text (e.g., Casella & Berger, Statistical Inference)